Text Classification from Positive and Unlabeled Data using Misclassified Data Correction

نویسندگان

  • Fumiyo Fukumoto
  • Yoshimi Suzuki
  • Suguru Matsuyoshi
چکیده

This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data. We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines (SVM). The results using Reuters documents showed that the method was comparable to the current state-of-the-art biasedSVM method as the F-score obtained by our method was 0.627 and biased-SVM was 0.614.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Text Classification Using Positive and Unlabeled Data

Text classification using positive and unlabeled data refers to the problem of building text classifier using positive documents (P) of one class and unlabeled documents (U) of many other classes. U consists of positive and negative documents. Some existing methods for solving the PU-Learning problem are building a classifier in a two-step process. Generally speaking, these existing methods do ...

متن کامل

Learning to Rank Biomedical Documents with only Positive and Unlabeled Examples: A Case Study

In the text mining field, obtaining training data requires human experts' labeling efforts, which is often time consuming and expensive. Supervised learning with only a small number of positive examples and a large amount of unlabeled data, which is easy to get, has attracted booming interests in the field. A recently proposed relabeling method, which assumes unlabeled data as negative data for...

متن کامل

Text Classification and Co-training from Positive and Unlabeled Examples

In the general framework of semi-supervised learning from labeled and unlabeled data, we consider the specific problem of learning from a pool of positive data, without any negative data but with the help of unlabeled data. We study a naive Bayes algorithm PNB from positive and unlabeled examples. Then, we consider the case where the number of positive examples is quite small, assuming that the...

متن کامل

Using Error-Correcting Codes with Co-Training for text classification with a large number of categories

A major concern with supervised learning techniques for text classification is that they often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn effectively from a small number of labeled examples augmented with a large number of unlabeled examples. In this paper, we develop a framework t...

متن کامل

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013